Unsupervised learning on illicit trade data
Alice Lépissier
Bren School of Environmental Science and Management
Policy and media attention on illicit financial flows (IFF) has increased, with the recognition that Africa is a net creditor to the world.
listOfImageNames = ['Images/NYT_op-ed_2013.PNG',
'Images/Guardian headline_2017.PNG',
'Images/Guardian headline_2015.PNG',
'Images/Economist headline_2019.PNG']
for imageName in listOfImageNames:
display(Image(filename=imageName))
What is trade mis-invoicing?
Motivations for trade mis-invoicing include:
Why does trade mis-invoicing matter?
display(Image(filename='Images/Governance_loop.png'))
Data source
Zoom in on Africa
display(Image(filename='Images/GER Total Average Africa IFF Percent Trade high.png'))
Goals
Mis-invoiced trade for countries by sectors
IFF_Sector_Imp = IFF_Sector[['section', 'Imp_IFF_hi']]
IFF_Sector_Exp = IFF_Sector[['section', 'Exp_IFF_hi']]
IFF_Sector_Imp
| section | Imp_IFF_hi | ||
|---|---|---|---|
| reporter.ISO | year | ||
| DZA | 2001 | Animal and Animal Products | 1.914633e+08 |
| 2001 | Pulp of Wood or of Other Fibrous Material | 7.436143e+07 | |
| 2001 | Textiles | 3.560644e+07 | |
| 2001 | Footwear and Headgear | 1.146507e+06 | |
| 2001 | Stone, Glass, and Ceramics | 2.935880e+07 | |
| ... | ... | ... | ... |
| ZWE | 2007 | Arms and Ammunition | 0.000000e+00 |
| 2009 | Works of Art | 0.000000e+00 | |
| 2013 | Miscellaneous Manufactured Articles | 0.000000e+00 | |
| 2014 | Miscellaneous Manufactured Articles | 0.000000e+00 | |
| 2014 | Mineral Products | 0.000000e+00 |
11133 rows × 2 columns
Mis-invoiced trade for dyads
IFF_Dest_Imp = IFF_Dest[['partner.ISO', 'Imp_IFF_hi']]
IFF_Dest_Exp = IFF_Dest[['partner.ISO', 'Exp_IFF_hi']]
IFF_Dest_Imp
| partner.ISO | Imp_IFF_hi | ||
|---|---|---|---|
| reporter.ISO | year | ||
| DZA | 2001 | AND | 1.609561e+04 |
| 2001 | ARG | 4.717459e+07 | |
| 2001 | AUS | 2.027641e+07 | |
| 2001 | AUT | 7.641706e+07 | |
| 2001 | BEL | 2.285729e+07 | |
| ... | ... | ... | ... |
| ZWE | 2014 | MDG | 0.000000e+00 |
| 2014 | SGP | 0.000000e+00 | |
| 2014 | CHE | 0.000000e+00 | |
| 2014 | ARE | 0.000000e+00 | |
| 2015 | UGA | 0.000000e+00 |
34757 rows × 2 columns
More data wrangling
crosswalk = pd.read_excel("Data/crosswalk.xlsx").rename(columns={'Country': 'country'})
crosswalk.head()
| ISO3166.3 | ISO3166.2 | country | UN_Region | UN_Region_Code | UN_Sub-region | UN_Sub-region_Code | UN_Intermediate_Region | UN_Intermediate_Region_Code | UN_M49_Code | ... | WB_Income_Group_Code | WB_Region | WB_Lending_Category | WB_Other | OECD | EU28 | Arab League | Commonwealth | Longitude | Latitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABW | AW | Aruba | Americas | 19.0 | Latin America and the Caribbean | 419.0 | Caribbean | 29.0 | 533.0 | ... | HIC | Latin America and Caribbean | .. | NaN | 0 | 0 | 0 | 0 | -69.982677 | 12.520880 |
| 1 | AFG | AF | Afghanistan | Asia | 142.0 | Southern Asia | 34.0 | NaN | NaN | 4.0 | ... | LIC | South Asia | IDA | HIPC | 0 | 0 | 0 | 0 | 66.004734 | 33.835231 |
| 2 | AFG | AF | Afghanistan, Islamic Republic of | Asia | 142.0 | Southern Asia | 34.0 | NaN | NaN | 4.0 | ... | LIC | South Asia | IDA | HIPC | 0 | 0 | 0 | 0 | 66.004734 | 33.835231 |
| 3 | AGO | AO | Angola | Africa | 2.0 | Sub-Saharan Africa | 202.0 | Middle Africa | 17.0 | 24.0 | ... | LMC | Sub-Saharan Africa | IBRD | NaN | 0 | 0 | 0 | 0 | 17.537368 | -12.293361 |
| 4 | AIA | AI | Anguila | Americas | 19.0 | Latin America and the Caribbean | 419.0 | Caribbean | 29.0 | 660.0 | ... | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 0 | -63.064989 | 18.223959 |
5 rows × 25 columns
def create_features(data, values, features, obs):
"""
TODO: add doc string
"""
features_data = data.pivot_table(values=values,
columns=features,
index=[obs, 'year'],
fill_value=0)
return features_data
def biplot_PCA(features_data, nPC=2, firstPC=1, secondPC=2, obs='reporter.ISO', show_loadings=False):
"""Projects the data in the 2-dimensional space spanned by 2 principal components
chosen by the user, along with a bi-plot of the top 3 loadings per PC, and colors
by class label.
Args:
features_data: data-set of features
nPC: number of principal components
firstPC: integer denoting first principal component to plot in bi-plot
secondPC: integer denoting second principal component to plot in bi-plot
obs: string denoting index of class labels (in features_data)
show_loadings: Boolean indicating whether PCA loadings should be displayed
Returns:
plot (static)
pca_loadings (if show_loadings=True)
"""
# Run PCA (standardize data beforehand)
features_data_std = StandardScaler().fit_transform(features_data)
# features_data_std = power_transform(features_data, method='yeo-johnson', standardize=True)
pca = PCA(n_components=nPC, random_state=234)
princ_comp = pca.fit_transform(features_data_std)
# Extract PCA loadings
cols = ['PC' + str(c+1) for c in np.arange(nPC)]
pca_loadings = pd.DataFrame(pca.components_.T,
columns=cols,
index=list(features_data.columns))
# Extract PCA scores
pca_scores = pd.DataFrame(princ_comp,
columns=cols)
pca_scores[obs] = features_data.reset_index()[obs].values.tolist()
pca_scores['year'] = features_data.reset_index()['year'].values.tolist()
score_PC1 = princ_comp[:,firstPC-1]
score_PC2 = princ_comp[:,secondPC-1]
# Generate plot data
if obs == 'reporter.ISO':
plot_data = pd.merge(pca_scores, obs_info, on=[obs, 'year'])
color_obs = 'reporter'
tooltip_obs = ['reporter', 'year', 'Income group (World Bank)', 'Country status (UN)']
else:
plot_data = pca_scores
color_obs = 'section'
tooltip_obs = ['section', 'year']
# Return chosen PCs to plot
PC1 = 'PC'+str(firstPC)
PC2 = 'PC'+str(secondPC)
# Extract top loadings (in absolute value)
# TO DO: use dict to iterate over
toploadings_PC1 = pca_loadings.apply(lambda x: abs(x)).sort_values(by=PC1).tail(3)[[PC1, PC2]]
toploadings_PC2 = pca_loadings.apply(lambda x: abs(x)).sort_values(by=PC2).tail(3)[[PC1, PC2]]
originsPC1 = pd.DataFrame({'index':toploadings_PC1.index.tolist(),
PC1: np.zeros(3),
PC2: np.zeros(3)})
originsPC2 = pd.DataFrame({'index':toploadings_PC2.index.tolist(),
PC1: np.zeros(3),
PC2: np.zeros(3)})
toploadings_PC1 = pd.concat([toploadings_PC1.reset_index(), originsPC1], axis=0)
toploadings_PC2 = pd.concat([toploadings_PC2.reset_index(), originsPC2], axis=0)
toploadings_PC1[PC1] = toploadings_PC1[PC1]*max(score_PC1)*1.5
toploadings_PC1[PC2] = toploadings_PC1[PC2]*max(score_PC2)*1.5
toploadings_PC2[PC1] = toploadings_PC2[PC1]*max(score_PC1)*1.5
toploadings_PC2[PC2] = toploadings_PC2[PC2]*max(score_PC2)*1.5
# Project top 3 loadings over the space spanned by 2 principal components
lines = alt.Chart().mark_line().encode()
for color, i, dataset in zip(['#440154FF', '#21908CFF'], [0,1], [toploadings_PC1, toploadings_PC2]):
lines[i] = alt.Chart(dataset).mark_line(color=color).encode(
x= PC1 +':Q',
y= PC2 +':Q',
detail='index'
).properties(
width=500,
height=500
)
# Add labels to the loadings
text=alt.Chart().mark_text().encode()
for color, i, dataset in zip(['#440154FF', '#21908CFF'], [0, 1], [toploadings_PC1[0:3], toploadings_PC2[0:3]]):
text[i] = alt.Chart(dataset).mark_text(
align='left',
baseline='bottom',
color=color
).encode(
x= PC1 +':Q',
y= PC2 +':Q',
text='index'
)
# Scatter plot colored by observation class label
points = alt.Chart(plot_data).mark_circle(size=60).encode(
x=alt.X(PC1, axis=alt.Axis(title='Principal Component ' + str(firstPC))),
y=alt.X(PC2, axis=alt.Axis(title='Principal Component ' + str(secondPC))),
color=alt.Color(color_obs, scale=alt.Scale(scheme='category20b'),
legend=alt.Legend(orient='right')),
tooltip=tooltip_obs
).interactive()
# Bind it all together
chart = (points + lines[0] + lines[1] + text[0] + text[1])
chart.display()
if show_loadings:
return pca_loadings
def scree_plot(features_data, show_explained_var=False):
"""
TODO: add comments and docstring
"""
features_data_std = StandardScaler().fit_transform(features_data)
pca = PCA(n_components=features_data_std.shape[1], random_state=234)
princ_comp = pca.fit_transform(features_data_std)
explained_var = pd.DataFrame({'PC': np.arange(1,features_data_std.shape[1]+1),
'var': pca.explained_variance_ratio_,
'cumvar': np.cumsum(pca.explained_variance_ratio_)})
# Adapted from https://altair-viz.github.io/gallery/multiline_tooltip.html
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=['PC'], empty='none')
# The basic line
line = alt.Chart(explained_var).mark_line(interpolate='basis', color='#FDE725FF').encode(
alt.X('PC:Q',
scale=alt.Scale(domain=(1, len(explained_var))),
axis=alt.Axis(title='Principal Component')
),
alt.Y('cumvar:Q',
scale=alt.Scale(domain=(min(explained_var['cumvar']), 1)),
axis=alt.Axis(title='Cumulative Variance Explained')
),
)
# Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart(explained_var).mark_point().encode(
x='PC:Q',
opacity=alt.value(0),
).add_selection(
nearest
)
# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)
# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'cumvar:Q', alt.value(' '))
)
# Draw a rule at the location of the selection
rules = alt.Chart(explained_var).mark_rule(color='gray').encode(
x='PC:Q',
).transform_filter(
nearest
)
# Put the five layers into a chart and bind the data
out = alt.layer(
line, selectors, points, rules, text
).properties(
title='Cumulative scree plot',
width=600, height=300
)
out.display()
if show_explained_var:
return explained_var[['PC', 'var']]
sector_features = create_features(IFF_Sector_Imp, 'Imp_IFF_hi',
features='section', obs='reporter.ISO')
sector_features
| section | Animal and Animal Products | Animal or Vegetable Fats and Oils | Arms and Ammunition | Base Metals | Chemicals and Allied Industries | Footwear and Headgear | Machinery and Electrical | Mineral Products | Miscellaneous Manufactured Articles | Pearls, Precious Stones and Metals | ... | Precision Instruments | Prepared Foodstuffs | Pulp of Wood or of Other Fibrous Material | Raw Hides, Skins, Leather, and Furs | Stone, Glass, and Ceramics | Textiles | Transportation | Vegetable Products | Wood and Wood Products | Works of Art | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| reporter.ISO | year | |||||||||||||||||||||
| AGO | 2009 | 2.694946e+08 | 7.652455e+07 | 67738.261242 | 1.025920e+09 | 5.115176e+08 | 2.920410e+07 | 1.525512e+09 | 1.727293e+09 | 2.472936e+08 | 5.575633e+05 | ... | 1.242156e+08 | 4.781184e+08 | 1.149992e+08 | 1.097769e+07 | 1.349018e+08 | 1.827787e+08 | 2.198838e+09 | 4.249722e+08 | 4.337043e+07 | 177640.929155 |
| 2010 | 2.064906e+08 | 7.311396e+07 | 0.000000 | 9.989842e+08 | 2.575443e+08 | 7.853065e+06 | 1.308337e+09 | 2.953387e+09 | 8.404302e+07 | 3.209177e+05 | ... | 1.115183e+08 | 2.342889e+08 | 7.332134e+07 | 4.013044e+06 | 7.601278e+07 | 8.285704e+07 | 9.611656e+08 | 1.959128e+08 | 2.310136e+07 | 404828.231824 | |
| 2011 | 2.995594e+08 | 8.058180e+07 | 89365.880480 | 6.689511e+08 | 2.931591e+08 | 1.172881e+07 | 1.389743e+09 | 2.286539e+09 | 4.413341e+07 | 9.645181e+05 | ... | 1.418956e+08 | 3.159980e+08 | 9.911835e+07 | 9.995669e+06 | 6.607126e+07 | 7.912925e+07 | 7.070474e+08 | 3.734589e+08 | 1.363601e+07 | 586586.948365 | |
| 2012 | 7.103770e+08 | 2.746924e+08 | 261265.895948 | 1.753970e+09 | 8.540407e+08 | 3.823724e+07 | 2.790368e+09 | 9.203666e+08 | 1.763904e+08 | 1.271289e+06 | ... | 2.073111e+08 | 1.025668e+09 | 2.641831e+08 | 1.394356e+07 | 1.630307e+08 | 1.683128e+08 | 1.975553e+09 | 6.541154e+08 | 3.980884e+07 | 739122.050820 | |
| 2013 | 4.031560e+08 | 1.541679e+08 | 188343.533148 | 9.230300e+08 | 4.793908e+08 | 1.212363e+07 | 1.428283e+09 | 2.009915e+09 | 4.871315e+07 | 6.250082e+05 | ... | 1.063123e+08 | 4.931687e+08 | 1.798533e+08 | 8.930246e+06 | 7.882264e+07 | 7.858108e+07 | 4.962937e+09 | 3.995114e+08 | 1.836680e+07 | 418841.640492 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ZWE | 2011 | 3.098093e+07 | 6.060789e+07 | 9124.882478 | 1.693777e+08 | 2.260227e+09 | 2.817909e+06 | 3.231869e+08 | 2.747704e+08 | 9.872441e+06 | 3.618592e+06 | ... | 2.136919e+07 | 1.144497e+08 | 5.855387e+07 | 6.259512e+05 | 2.652474e+07 | 4.726781e+07 | 8.963644e+08 | 1.874874e+08 | 6.016952e+06 | 35481.251510 |
| 2012 | 2.784481e+07 | 5.549232e+07 | 856.631347 | 1.085453e+08 | 4.929549e+08 | 7.015919e+06 | 3.451931e+08 | 1.461844e+08 | 1.510668e+07 | 2.876014e+05 | ... | 3.162707e+07 | 2.025934e+08 | 6.024629e+07 | 1.412797e+06 | 2.482629e+07 | 5.147542e+07 | 9.468142e+08 | 1.900125e+08 | 7.901902e+06 | 59677.045599 | |
| 2013 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | ... | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | |
| 2014 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | ... | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | |
| 2015 | 1.458046e+07 | 7.135043e+07 | 19192.507302 | 8.703951e+07 | 2.299085e+08 | 4.468801e+06 | 3.437244e+08 | 1.401916e+08 | 2.432263e+07 | 3.672531e+07 | ... | 3.233606e+07 | 3.778318e+07 | 3.157795e+07 | 1.061420e+06 | 1.724449e+07 | 2.841926e+07 | 2.137692e+08 | 1.585549e+08 | 4.569325e+06 | 6305.070602 |
624 rows × 21 columns
biplot_PCA(sector_features, 10, 1, 2, obs='reporter.ISO', show_loadings=True)
| PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Animal and Animal Products | 0.223097 | -0.285686 | -0.052218 | 0.200337 | 0.280789 | -0.075192 | -0.159824 | 0.157986 | -0.133621 | 0.067581 |
| Animal or Vegetable Fats and Oils | 0.145805 | -0.041750 | 0.262100 | -0.281551 | -0.378544 | -0.470151 | 0.409970 | 0.471670 | 0.177591 | -0.031639 |
| Arms and Ammunition | 0.030559 | -0.044358 | 0.524677 | -0.654182 | 0.434354 | 0.055627 | -0.201368 | -0.113006 | -0.079142 | -0.115541 |
| Base Metals | 0.235966 | -0.206534 | -0.261180 | -0.147777 | -0.019680 | -0.153272 | 0.017922 | -0.247084 | 0.174715 | -0.069873 |
| Chemicals and Allied Industries | 0.271238 | 0.107591 | -0.117222 | -0.089472 | -0.096976 | -0.130645 | -0.071165 | -0.219172 | -0.198166 | 0.036517 |
| Footwear and Headgear | 0.206357 | 0.378112 | 0.015170 | -0.019961 | -0.001383 | -0.029500 | -0.042735 | 0.010544 | 0.026510 | -0.252435 |
| Machinery and Electrical | 0.272101 | 0.160466 | 0.076229 | 0.091921 | -0.067637 | -0.159494 | 0.022549 | -0.135941 | -0.157103 | -0.115312 |
| Mineral Products | 0.238785 | 0.071925 | -0.098357 | -0.201011 | -0.166463 | 0.073908 | -0.133348 | 0.121464 | -0.241190 | 0.675042 |
| Miscellaneous Manufactured Articles | 0.246123 | 0.283544 | 0.122775 | 0.033592 | -0.049743 | -0.014818 | -0.017630 | 0.069480 | -0.247438 | -0.043076 |
| Pearls, Precious Stones and Metals | 0.060193 | 0.251657 | -0.373387 | -0.153599 | 0.567863 | 0.055855 | 0.647708 | 0.088517 | -0.049224 | 0.089860 |
| Plastics and Rubbers | 0.264093 | -0.144764 | 0.124109 | 0.236111 | 0.112342 | 0.023331 | 0.062706 | 0.076398 | 0.119468 | -0.076649 |
| Precision Instruments | 0.223754 | 0.362288 | -0.013533 | 0.023957 | -0.084425 | -0.090796 | -0.080419 | -0.186415 | -0.262231 | -0.078130 |
| Prepared Foodstuffs | 0.252722 | -0.227483 | -0.012139 | 0.160146 | 0.177798 | -0.024717 | -0.088546 | 0.289203 | -0.187744 | -0.058613 |
| Pulp of Wood or of Other Fibrous Material | 0.272304 | -0.112890 | -0.010570 | 0.074023 | 0.024044 | -0.016865 | 0.003323 | -0.123810 | 0.135842 | 0.200798 |
| Raw Hides, Skins, Leather, and Furs | 0.206454 | 0.084348 | 0.178427 | 0.121865 | -0.101317 | 0.619649 | 0.102592 | 0.287981 | -0.057302 | -0.251296 |
| Stone, Glass, and Ceramics | 0.225603 | 0.111290 | 0.264504 | 0.269760 | 0.061210 | -0.018303 | 0.176815 | -0.368090 | 0.400752 | -0.018317 |
| Textiles | 0.209936 | -0.103101 | -0.054321 | -0.264334 | -0.273480 | 0.542184 | 0.126078 | -0.076346 | 0.216014 | 0.139611 |
| Transportation | 0.238443 | -0.119628 | 0.256225 | 0.081261 | 0.184539 | -0.059237 | 0.015553 | -0.095856 | 0.219874 | 0.334486 |
| Vegetable Products | 0.245486 | -0.284943 | -0.122225 | -0.029458 | 0.071482 | -0.036325 | -0.074223 | 0.202113 | -0.106308 | -0.240274 |
| Wood and Wood Products | 0.201178 | -0.222154 | -0.377723 | -0.300290 | -0.132165 | 0.007197 | -0.048891 | -0.185469 | 0.074259 | -0.351532 |
| Works of Art | 0.089055 | 0.388758 | -0.239308 | -0.078967 | 0.157495 | -0.041382 | -0.486227 | 0.369405 | 0.561157 | 0.018339 |
display(Image(filename='Images/Treemap sectors average Africa.png'))
biplot_PCA(sector_features, 10, 3, 4, obs='reporter.ISO')
biplot_PCA(sector_features, 10, 5, 6, obs='reporter.ISO')
scree_plot(sector_features, show_explained_var=True)
| PC | var | |
|---|---|---|
| 0 | 1 | 0.524988 |
| 1 | 2 | 0.121415 |
| 2 | 3 | 0.054159 |
| 3 | 4 | 0.049898 |
| 4 | 5 | 0.042107 |
| 5 | 6 | 0.041007 |
| 6 | 7 | 0.036014 |
| 7 | 8 | 0.028847 |
| 8 | 9 | 0.024146 |
| 9 | 10 | 0.018279 |
| 10 | 11 | 0.012558 |
| 11 | 12 | 0.009987 |
| 12 | 13 | 0.007011 |
| 13 | 14 | 0.006890 |
| 14 | 15 | 0.005213 |
| 15 | 16 | 0.004848 |
| 16 | 17 | 0.004001 |
| 17 | 18 | 0.002819 |
| 18 | 19 | 0.002489 |
| 19 | 20 | 0.001800 |
| 20 | 21 | 0.001524 |
country_features = create_features(IFF_Sector_Imp, 'Imp_IFF_hi',
features='reporter.ISO', obs='section')
country_features
| reporter.ISO | AGO | BDI | BEN | BFA | BWA | CAF | CIV | CMR | COG | COM | ... | STP | SWZ | SYC | TGO | TUN | TZA | UGA | ZAF | ZMB | ZWE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| section | year | |||||||||||||||||||||
| Animal and Animal Products | 2000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.00000 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000e+00 | 4.044600e+06 | 1.572640e+07 | 1.060197e+06 | 2.902091e+05 | 2.341209e+07 | 0.000000e+00 | 0.000000e+00 |
| 2001 | 0.000000 | 0.000000 | 9.464068e+06 | 2.013378e+06 | 4.018742e+06 | 351958.882652 | 7.103384e+07 | 2.552567e+07 | 0.00000 | 0.0 | ... | 0.000000 | 0.000000 | 3.439935e+07 | 1.087398e+07 | 0.000000e+00 | 0.000000e+00 | 2.522664e+05 | 1.854458e+07 | 1.832385e+06 | 0.000000e+00 | |
| 2002 | 0.000000 | 288779.572593 | 1.458450e+07 | 7.756380e+05 | 3.524820e+06 | 213560.077651 | 7.767896e+07 | 2.907597e+07 | 0.00000 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000e+00 | 6.574580e+06 | 9.664435e+06 | 0.000000e+00 | 6.944404e+05 | 1.452489e+07 | 4.354400e+05 | 7.346496e+06 | |
| 2003 | 0.000000 | 217920.960357 | 2.297721e+07 | 4.533317e+05 | 0.000000e+00 | 0.000000 | 1.286866e+08 | 3.485783e+07 | 0.00000 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000e+00 | 6.290272e+06 | 1.732785e+07 | 1.294665e+06 | 1.358896e+06 | 2.153838e+07 | 1.422125e+06 | 0.000000e+00 | |
| 2004 | 0.000000 | 0.000000 | 2.742553e+07 | 3.429694e+05 | 5.275531e+04 | 0.000000 | 9.283012e+07 | 4.940023e+07 | 0.00000 | 0.0 | ... | 0.000000 | 59632.763605 | 0.000000e+00 | 3.901862e+06 | 3.574647e+07 | 0.000000e+00 | 8.053124e+05 | 2.868016e+07 | 2.791570e+06 | 0.000000e+00 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Works of Art | 2012 | 739122.050820 | 0.000000 | 0.000000e+00 | 1.054676e+04 | 2.623004e+05 | 0.000000 | 1.820478e+04 | 3.875133e+05 | 0.00000 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000e+00 | 1.125696e+03 | 0.000000e+00 | 7.252236e+04 | 1.388514e+04 | 1.371564e+07 | 3.453691e+04 | 5.967705e+04 |
| 2013 | 418841.640492 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 2.282032e+04 | 0.000000 | 1.287629e+04 | 2.104949e+04 | 0.00000 | 0.0 | ... | 9584.734907 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 5.191721e+05 | 2.986963e+04 | 1.006657e+07 | 1.017682e+05 | 0.000000e+00 | |
| 2014 | 390140.730849 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 9.331590e+04 | 0.000000 | 1.017252e+06 | 1.826816e+05 | 8003.37246 | 0.0 | ... | 20941.755221 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 6.731774e+04 | 2.129568e+04 | 2.543377e+07 | 0.000000e+00 | 0.000000e+00 | |
| 2015 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 5.977903e+04 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.00000 | 0.0 | ... | 4447.368931 | 0.000000 | 5.857979e+05 | 0.000000e+00 | 3.088086e+03 | 5.385830e+05 | 1.193596e+04 | 6.697451e+06 | 5.843484e+04 | 6.305071e+03 | |
| 2016 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 4.195233e+04 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.00000 | 0.0 | ... | 1327.652396 | 0.000000 | 4.463501e+05 | 0.000000e+00 | 0.000000e+00 | 3.443084e+04 | 2.305443e+03 | 1.295357e+07 | 0.000000e+00 | 0.000000e+00 |
357 rows × 46 columns
biplot_PCA(country_features, 10, 1, 2, obs='section', show_loadings=True)
| PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| AGO | 0.172866 | 5.695754e-02 | 1.915290e-01 | -6.571267e-02 | -1.498597e-02 | 2.683983e-01 | -6.131957e-02 | -9.901379e-02 | -1.352142e-02 | 1.960619e-01 |
| BDI | 0.159396 | 1.399172e-02 | 1.100982e-01 | 1.072287e-01 | -5.780584e-02 | -3.006342e-02 | 1.311807e-01 | -6.306715e-02 | 8.380471e-02 | -1.595603e-01 |
| BEN | 0.166261 | 2.839527e-01 | -4.853377e-02 | -1.227506e-01 | -8.929173e-02 | 2.840864e-02 | -5.069704e-02 | -1.050768e-01 | 1.115173e-01 | -1.479526e-01 |
| BFA | 0.200912 | 1.961349e-01 | -1.479219e-02 | -3.284903e-02 | -7.687652e-03 | 1.947496e-01 | -8.961797e-03 | -9.812748e-02 | 5.298346e-02 | -6.100222e-02 |
| BWA | 0.058704 | 5.101605e-04 | 1.048248e-01 | 1.176491e-01 | 1.116177e-01 | 4.542630e-02 | 5.291913e-02 | 1.702242e-01 | 1.074927e-01 | 6.990029e-01 |
| CAF | 0.136213 | -2.941790e-01 | 7.144585e-02 | 1.466869e-01 | 1.005191e-01 | -8.211151e-04 | 1.767076e-02 | -9.841998e-02 | 4.459720e-02 | -6.455230e-02 |
| CIV | 0.170157 | -4.190252e-02 | -1.992766e-01 | -1.178971e-01 | -8.139640e-02 | 2.099028e-01 | 1.481461e-01 | -9.124417e-02 | 9.430036e-02 | -2.959315e-02 |
| CMR | 0.199563 | 6.685309e-03 | 2.723196e-02 | -1.591573e-01 | -1.158744e-01 | 4.651018e-02 | 4.997217e-02 | -4.900403e-02 | 6.736277e-02 | -4.601119e-02 |
| COG | 0.092230 | -1.314851e-01 | 3.146417e-01 | -1.361157e-01 | -2.737181e-01 | 2.682128e-01 | -4.522047e-02 | 1.802042e-02 | 4.328099e-02 | 3.111778e-02 |
| COM | 0.096409 | -2.063490e-01 | 2.921122e-01 | -2.226963e-01 | -7.053853e-02 | -1.456883e-01 | 8.379586e-02 | -6.576844e-02 | 9.280223e-02 | -1.472937e-01 |
| CPV | 0.162562 | -1.932140e-01 | 3.911200e-02 | -6.076881e-02 | 6.992487e-02 | 3.542767e-02 | 2.890140e-01 | -1.259045e-01 | 5.085305e-02 | -4.426356e-02 |
| DZA | 0.097941 | 4.125012e-02 | -3.694645e-02 | 4.185959e-01 | -2.291923e-01 | -2.224267e-01 | -7.905464e-02 | 8.880404e-02 | -9.011562e-02 | -5.841375e-02 |
| EGY | 0.175564 | 1.852732e-01 | 9.778806e-02 | 1.870187e-01 | 8.946189e-03 | -2.924499e-03 | 1.181851e-02 | -4.328654e-02 | 4.339753e-03 | -1.929171e-01 |
| ETH | 0.121994 | -1.897909e-01 | 7.695418e-02 | 3.167105e-01 | 2.484336e-01 | -1.802844e-01 | -7.872203e-02 | -9.597359e-02 | -2.026803e-02 | -1.094738e-01 |
| GAB | 0.042052 | -1.989049e-01 | -3.375917e-01 | 1.247060e-02 | -2.588424e-01 | -4.558121e-02 | 8.313195e-02 | 9.621236e-02 | -7.997286e-02 | -1.514322e-01 |
| GHA | 0.107784 | -1.996829e-01 | -6.503997e-02 | 6.007675e-02 | -3.272539e-01 | 2.451643e-01 | 4.926298e-02 | 2.807364e-01 | -3.340735e-01 | -5.059900e-03 |
| GIN | 0.052400 | -1.350973e-01 | -3.351174e-01 | -8.155761e-02 | -4.401417e-02 | -4.334381e-02 | -3.444465e-01 | -1.967460e-02 | 1.193137e-01 | -2.723141e-02 |
| GMB | 0.162689 | 8.598690e-02 | -9.929674e-02 | -2.316451e-01 | 2.262847e-01 | 1.330992e-01 | 2.662606e-01 | 1.742886e-01 | -9.401992e-02 | -6.012329e-02 |
| GNB | 0.000016 | -1.996604e-02 | -8.847847e-02 | -2.131943e-02 | 6.445089e-02 | 9.726163e-02 | -2.861441e-01 | 3.631337e-01 | 5.592464e-01 | -2.128313e-01 |
| KEN | 0.117135 | -1.733487e-01 | -2.826062e-01 | -1.414929e-01 | -4.852900e-02 | 1.122855e-01 | -1.773947e-01 | -2.061400e-01 | -2.190356e-01 | 1.721080e-01 |
| LBY | 0.000000 | 0.000000e+00 | -2.584939e-26 | -1.654361e-24 | 2.710505e-20 | 2.710505e-20 | 4.336809e-19 | -6.938894e-18 | 5.551115e-17 | -2.775558e-17 |
| LSO | 0.031001 | -1.145362e-01 | 1.172217e-01 | -8.783260e-02 | -6.316290e-02 | 1.845925e-01 | -1.155909e-01 | 4.273390e-01 | 1.846813e-01 | -2.597209e-02 |
| MAR | 0.196450 | -1.442017e-03 | -1.407898e-01 | -1.151253e-01 | 4.096951e-02 | -2.501230e-01 | -1.218791e-01 | -1.159986e-01 | -2.772756e-02 | 9.286997e-02 |
| MDG | 0.187159 | 1.768800e-01 | -7.603241e-02 | 1.954600e-02 | 3.882583e-02 | -6.632947e-02 | 1.303028e-02 | 1.697959e-01 | 3.997142e-02 | -1.389174e-01 |
| MLI | 0.160911 | -1.212446e-01 | -4.690084e-02 | -1.486856e-01 | 8.742975e-02 | -2.762972e-01 | 8.729569e-02 | 2.434701e-01 | -8.316575e-02 | 3.368757e-02 |
| MOZ | 0.147230 | 1.608981e-01 | 9.473738e-02 | -1.983947e-02 | -2.406356e-01 | -2.654582e-01 | -2.507807e-03 | 5.991142e-02 | 9.261240e-02 | 1.184466e-01 |
| MRT | 0.142351 | 2.001775e-01 | 7.014463e-02 | -1.332407e-01 | -2.101042e-01 | -3.003770e-01 | -1.237907e-01 | -9.258926e-02 | 7.149717e-02 | 1.249696e-01 |
| MUS | 0.183529 | 7.441746e-02 | -1.253785e-01 | 7.372575e-02 | 1.275615e-01 | 4.308231e-02 | 8.698864e-02 | 1.594759e-01 | 3.117765e-02 | 8.570793e-02 |
| MWI | 0.196370 | 1.208545e-01 | -1.804584e-02 | -1.462830e-01 | 2.430288e-02 | 7.000297e-02 | -7.395843e-02 | -3.780468e-02 | -6.553461e-02 | -1.841761e-02 |
| NAM | 0.173960 | 1.201903e-01 | 5.978034e-02 | 9.556719e-02 | 2.108371e-01 | -2.454492e-02 | 2.251590e-01 | 1.968563e-01 | -5.019903e-04 | 1.281223e-01 |
| NER | 0.191899 | -1.071062e-01 | 1.036822e-01 | -3.490165e-02 | 8.799988e-03 | 4.101665e-02 | -1.036213e-01 | -1.501394e-01 | 5.717845e-02 | 7.629436e-02 |
| NGA | 0.153000 | -8.916565e-02 | 2.197113e-01 | 2.082393e-02 | -2.395585e-01 | -1.659439e-01 | -1.019044e-01 | 1.527359e-01 | -3.809505e-02 | 8.557721e-02 |
| RWA | 0.186734 | -3.747262e-02 | 1.608489e-01 | 2.275355e-01 | 3.285765e-02 | 9.220145e-02 | -1.415166e-01 | -1.482959e-01 | -3.582179e-03 | 2.383508e-02 |
| SDN | 0.122997 | -2.804786e-01 | 1.221158e-01 | -1.345090e-01 | -5.996290e-02 | -1.354197e-01 | -2.219003e-02 | 1.378054e-01 | -7.715869e-02 | 7.151525e-04 |
| SEN | 0.198048 | -3.415201e-02 | -1.017232e-01 | -1.030800e-01 | -1.082653e-01 | -2.190300e-02 | -1.642827e-02 | -9.561341e-02 | -9.388535e-02 | 5.110459e-02 |
| SLE | 0.000000 | -5.204546e-35 | 8.399678e-33 | 1.376288e-30 | -3.947723e-26 | -8.283784e-26 | -3.475898e-24 | 3.838377e-22 | -4.456022e-21 | -1.660979e-21 |
| STP | 0.093915 | 2.692090e-01 | 7.292042e-02 | 2.717052e-01 | -2.160666e-01 | 2.404257e-01 | -1.858177e-02 | -4.529530e-03 | -4.636809e-02 | -6.325914e-02 |
| SWZ | 0.004459 | -9.037762e-02 | -2.172397e-01 | 4.529566e-02 | -1.475804e-01 | -4.240062e-03 | 1.559841e-01 | -2.357850e-01 | 5.362103e-01 | 2.615336e-01 |
| SYC | 0.012375 | -7.886680e-02 | -1.900444e-01 | 2.392704e-01 | -2.750048e-01 | -4.851200e-02 | 5.049505e-01 | 2.850039e-02 | 1.579691e-01 | 1.255474e-02 |
| TGO | 0.184941 | 8.829184e-02 | -4.195564e-02 | 1.516059e-02 | -3.407124e-02 | -7.070676e-02 | -2.815878e-02 | -5.328573e-03 | -7.180032e-03 | -5.552878e-02 |
| TUN | 0.189500 | -1.490447e-01 | -5.294429e-02 | 1.368356e-01 | 1.828756e-01 | 1.829014e-02 | -1.057055e-01 | 1.539954e-01 | 1.181978e-02 | -9.518775e-04 |
| TZA | 0.198557 | 1.129045e-01 | -8.978178e-02 | -1.479353e-01 | 1.914884e-01 | -8.737537e-02 | 1.348190e-01 | 8.895631e-02 | -7.496216e-02 | 2.430973e-02 |
| UGA | 0.220482 | 1.310309e-01 | -7.584076e-02 | 3.261941e-02 | 7.776106e-02 | 2.816737e-02 | -2.272271e-02 | 3.592921e-03 | -2.836216e-02 | -7.275351e-02 |
| ZAF | 0.190412 | -7.411430e-02 | -1.422615e-01 | 1.624323e-01 | 5.822959e-02 | -5.136024e-02 | -2.069728e-01 | 5.163535e-03 | -1.084788e-02 | 1.159689e-01 |
| ZMB | 0.150023 | -1.782886e-01 | -7.122649e-02 | 2.088026e-01 | 1.466777e-01 | 2.720530e-01 | -5.921551e-02 | -1.225385e-01 | 6.453899e-03 | -3.512622e-02 |
| ZWE | 0.098964 | -1.998928e-01 | 2.092424e-01 | -4.305728e-02 | 1.206815e-01 | -8.504574e-02 | 1.333520e-01 | -1.496703e-01 | 1.545886e-01 | -1.980936e-01 |
biplot_PCA(country_features, 10, 3, 4, obs='section')
biplot_PCA(country_features, 10, 5, 6, obs='section')
scree_plot(country_features)
fig, axes = joypy.joyplot(country_features.iloc[:,0:22], colormap=plt.cm.magma, figsize=(8,8));
fig, axes = joypy.joyplot(country_features.iloc[:,22:47], colormap=plt.cm.magma, figsize=(8,8));
country_features_log = country_features.apply(lambda x: np.log(x+1) if np.issubdtype(x.dtype, np.number) else x, axis=0)
fig, axes = joypy.joyplot(country_features_log.iloc[:,0:22], colormap=plt.cm.magma, figsize=(8,8));
fig, axes = joypy.joyplot(country_features_log.iloc[:,22:47], colormap=plt.cm.magma, figsize=(8,8));
country_features_yeo = power_transform(country_features, method='yeo-johnson', standardize=True)
country_features_yeo = pd.DataFrame(country_features_yeo,
index=country_features.index,
columns=country_features.columns)
fig, axes = joypy.joyplot(country_features_yeo.iloc[:,0:22], colormap=plt.cm.magma, figsize=(8,8));
fig, axes = joypy.joyplot(country_features_yeo.iloc[:,22:47], colormap=plt.cm.magma, figsize=(8,8));
biplot_PCA(country_features_log, 10, 1, 2, obs='section')
biplot_PCA(country_features_yeo, 10, 1, 2, obs='section')
partner_features = create_features(IFF_Dest_Imp, 'Imp_IFF_hi',
features='partner.ISO', obs='reporter.ISO')
partner_features
| partner.ISO | AGO | ALB | AND | ARE | ARG | ARM | ATG | AUS | AUT | AZE | ... | URY | USA | VCT | VEN | VNM | VUT | YEM | ZAF | ZMB | ZWE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| reporter.ISO | year | |||||||||||||||||||||
| AGO | 2009 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 6.014820e+07 | 0.0 | 0 | 6.980492e+06 | 3.119279e+05 | 0.000000 | ... | 2.361993e+06 | 9.270820e+08 | 0 | 0.0 | 4.734647e+07 | 0 | 0.000000 | 1.016827e+09 | 1.060946e+05 | 309894.487818 |
| 2010 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 2.963158e+07 | 0.0 | 0 | 2.971282e+06 | 5.626929e+06 | 18935.636990 | ... | 1.644130e+06 | 8.268511e+08 | 0 | 0.0 | 2.970677e+07 | 0 | 0.000000 | 3.315930e+08 | 2.665205e+04 | 420951.156359 | |
| 2011 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 5.044258e+07 | 0.0 | 0 | 9.077825e+06 | 2.409418e+06 | 20266.842796 | ... | 1.016561e+06 | 8.601995e+08 | 0 | 0.0 | 2.451965e+07 | 0 | 0.000000 | 3.281534e+08 | 4.525653e+05 | 0.000000 | |
| 2012 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 2.055669e+08 | 0.0 | 0 | 7.483480e+06 | 1.354856e+07 | 22457.788787 | ... | 1.582490e+06 | 1.441162e+09 | 0 | 0.0 | 5.942382e+07 | 0 | 354866.063494 | 7.944924e+08 | 1.180964e+03 | 0.000000 | |
| 2013 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 5.839522e+07 | 0.0 | 0 | 1.569576e+07 | 8.770795e+06 | 1734.698687 | ... | 3.447835e+06 | 7.139003e+08 | 0 | 0.0 | 3.931132e+07 | 0 | 0.000000 | 4.272753e+08 | 2.855892e+05 | 0.000000 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ZWE | 2011 | 0.0 | 0.0 | 0.0 | 5.290029e+07 | 2.310119e+07 | 0.0 | 0 | 2.325259e+06 | 3.171896e+05 | 0.000000 | ... | 2.721346e+04 | 6.462989e+08 | 0 | 0.0 | 5.288219e+06 | 0 | 0.000000 | 3.088036e+09 | 5.016118e+07 | 0.000000 |
| 2012 | 0.0 | 0.0 | 0.0 | 5.923175e+07 | 1.011059e+07 | 0.0 | 0 | 1.664879e+06 | 1.220841e+06 | 0.000000 | ... | 0.000000e+00 | 6.069423e+08 | 0 | 0.0 | 4.745996e+06 | 0 | 0.000000 | 1.242790e+09 | 1.704507e+08 | 0.000000 | |
| 2013 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.0 | 0 | 0.000000e+00 | 0.000000e+00 | 0.000000 | ... | 0.000000e+00 | 0.000000e+00 | 0 | 0.0 | 0.000000e+00 | 0 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | |
| 2014 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.0 | 0 | 0.000000e+00 | 0.000000e+00 | 0.000000 | ... | 0.000000e+00 | 0.000000e+00 | 0 | 0.0 | 0.000000e+00 | 0 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | |
| 2015 | 0.0 | 0.0 | 0.0 | 5.042948e+07 | 0.000000e+00 | 0.0 | 0 | 5.764013e+05 | 6.721992e+05 | 0.000000 | ... | 9.012925e+04 | 4.109564e+07 | 0 | 0.0 | 3.093477e+03 | 0 | 0.000000 | 6.474374e+08 | 0.000000e+00 | 0.000000 |
624 rows × 167 columns
biplot_PCA(partner_features, 10, 1, 2, show_loadings=True)
| PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| AGO | 0.064538 | -0.127775 | -0.044675 | -0.081190 | 0.015199 | -0.030508 | -0.020684 | -0.075254 | 0.026087 | 0.018444 |
| ALB | 0.044858 | 0.100240 | -0.085332 | 0.010378 | 0.222166 | -0.056160 | 0.016405 | -0.068677 | 0.098802 | -0.006931 |
| AND | 0.002434 | 0.006593 | -0.006567 | 0.005445 | -0.015707 | 0.021064 | -0.006253 | -0.004508 | 0.001468 | -0.016877 |
| ARE | 0.091433 | -0.038099 | 0.067933 | -0.046062 | 0.123878 | -0.061712 | -0.030800 | 0.190913 | -0.147428 | 0.060691 |
| ARG | 0.129513 | 0.139207 | -0.082968 | 0.019367 | -0.043149 | 0.085106 | -0.014389 | 0.038525 | 0.024086 | 0.000942 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| VUT | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| YEM | 0.040005 | -0.060918 | 0.007451 | 0.143530 | 0.037673 | 0.037506 | 0.085598 | 0.261330 | 0.309931 | -0.017295 |
| ZAF | 0.008847 | 0.007834 | 0.091261 | -0.022447 | 0.026785 | 0.001632 | -0.043753 | 0.104236 | -0.037778 | 0.124183 |
| ZMB | 0.040152 | 0.020620 | -0.037492 | 0.009481 | 0.093282 | -0.176393 | -0.033166 | 0.069316 | -0.109787 | 0.018559 |
| ZWE | 0.058102 | -0.112442 | 0.020694 | 0.215291 | 0.042842 | 0.010566 | 0.044425 | 0.149634 | 0.182897 | 0.017658 |
167 rows × 10 columns
display(Image(filename='Images/Top 10 destinations GER average Africa.png'))
biplot_PCA(partner_features, 10, 3, 4)
biplot_PCA(partner_features, 10, 5, 6)
scree_plot(partner_features)
partner_features_AFR = create_features(IFF_Dest_Imp_AFR, 'Imp_IFF_hi',
features='partner.ISO', obs='reporter.ISO')
partner_features_AFR
| partner.ISO | AGO | BDI | BEN | BFA | BWA | CAF | CIV | CMR | COG | COM | ... | STP | SWZ | SYC | TGO | TUN | TZA | UGA | ZAF | ZMB | ZWE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| reporter.ISO | year | |||||||||||||||||||||
| AGO | 2009 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 0.0 | 7.356525e+05 | 0.000000 | 0.000000e+00 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.000000 | 1.628157e+06 | 0.000000e+00 | 0.0 | 1.016827e+09 | 1.060946e+05 | 309894.487818 |
| 2010 | 0.0 | 0.0 | 0.0 | 0.0 | 2.025235e+05 | 0.0 | 5.577699e+07 | 53088.378146 | 1.402793e+08 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 252999.300576 | 1.137066e+08 | 5.529461e+03 | 0.0 | 3.315930e+08 | 2.665205e+04 | 420951.156359 | |
| 2011 | 0.0 | 0.0 | 0.0 | 0.0 | 5.501246e+03 | 0.0 | 6.262938e+06 | 257786.746458 | 3.466061e+07 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.000000 | 4.623718e+07 | 1.745372e+06 | 0.0 | 3.281534e+08 | 4.525653e+05 | 0.000000 | |
| 2012 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 0.0 | 1.425377e+07 | 222663.224270 | 2.587614e+07 | 0.0 | ... | 8435.069683 | 0.0 | 0.0 | 0.000000 | 1.262796e+07 | 1.253616e+05 | 0.0 | 7.944924e+08 | 1.180964e+03 | 0.000000 | |
| 2013 | 0.0 | 0.0 | 0.0 | 0.0 | 2.945193e+05 | 0.0 | 2.125395e+06 | 0.000000 | 4.357466e+06 | 0.0 | ... | 957.605075 | 0.0 | 0.0 | 0.000000 | 4.438123e+06 | 1.035603e+05 | 0.0 | 4.272753e+08 | 2.855892e+05 | 0.000000 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ZWE | 2011 | 0.0 | 0.0 | 0.0 | 0.0 | 8.741233e+07 | 0.0 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000e+00 | 1.295280e+06 | 0.0 | 3.088036e+09 | 5.016118e+07 | 0.000000 |
| 2012 | 0.0 | 0.0 | 0.0 | 0.0 | 3.187848e+07 | 0.0 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000e+00 | 2.729399e+06 | 0.0 | 1.242790e+09 | 1.704507e+08 | 0.000000 | |
| 2013 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 0.0 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000 | |
| 2014 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000e+00 | 0.0 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000 | |
| 2015 | 0.0 | 0.0 | 0.0 | 0.0 | 8.142369e+06 | 0.0 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000e+00 | 8.238550e+06 | 0.0 | 6.474374e+08 | 0.000000e+00 | 0.000000 |
604 rows × 46 columns
biplot_PCA(partner_features_AFR, partner_features_AFR.shape[1], 1, 2)
features_data_std = StandardScaler().fit_transform(partner_features_AFR)
pca = PCA(n_components=partner_features_AFR.shape[1])
princ_comp = pca.fit_transform(features_data_std)
#outlying = (princ_comp[:,44] > 1.5) | (princ_comp[:,45] > 3e-15)
#outlying = (princ_comp[:,44] > 0.5) | (princ_comp[:,45] > 1.5e-15)
outlying = (princ_comp[:,0] > 6) | (princ_comp[:,1] > 4)
partner_features_AFR[outlying].index
MultiIndex([('AGO', '2010'),
('CIV', '2013'),
('EGY', '2012'),
('GHA', '2002'),
('MAR', '2005'),
('MAR', '2006'),
('MAR', '2007'),
('MAR', '2008'),
('MAR', '2009'),
('MAR', '2010'),
('MAR', '2011'),
('MAR', '2012'),
('NGA', '2011'),
('ZAF', '2005'),
('ZAF', '2006'),
('ZAF', '2008'),
('ZAF', '2010'),
('ZAF', '2011'),
('ZAF', '2012'),
('ZAF', '2013'),
('ZAF', '2014'),
('ZAF', '2015'),
('ZAF', '2016')],
names=['reporter.ISO', 'year'])
partner_features_AFR_noout = partner_features_AFR[~outlying]
biplot_PCA(partner_features_AFR_noout, 46, 1, 2)
biplot_PCA_classes(sector_features, 21, 1, 2, classes='UN_Intermediate_Region')
biplot_PCA_classes(sector_features, 21, 1, 2, classes='Income group (World Bank)')
biplot_PCA_classes(partner_features, 46, 1, 2)
biplot_PCA_classes(partner_features, 46, 1, 2, classes='Income group (World Bank)')
biplot_PCA_classes(partner_features_AFR, 46, 1, 2)
biplot_PCA_classes(partner_features_AFR, 46, 1, 2, classes='Income group (World Bank)')
biplot_PCA_classes(partner_features_AFR_noout, 46, 1, 2)
biplot_PCA_classes(partner_features_AFR_noout, 46, 1, 2, classes='Income group (World Bank)')
# Change column names because Vega complains when there is a period
data = IFF_Dest_AFR.reset_index().groupby(['reporter.ISO', 'partner.ISO']).mean() \
.reset_index().rename(columns={'reporter.ISO': 'r', 'partner.ISO': 'p'})\
.apply(lambda x: np.log10(x+1) if np.issubdtype(x.dtype, np.number) else x, axis=0)
alt.Chart(
data,
title='Mis-invoiced trade'
).mark_rect().encode(
alt.X('r:N', axis=alt.Axis(title='Reporting country')),
alt.Y('p:N', axis=alt.Axis(title='Partner country')),
color=alt.Color('Imp_IFF_hi:Q',
scale=alt.Scale(scheme='inferno', reverse=True, nice=True)),
tooltip=[
alt.Tooltip('Imp_IFF_hi:Q', title='Mis-invoiced imports'),
alt.Tooltip('r', title='Reporting country'),
alt.Tooltip('p', title='Partner country')
]
).properties(width=550, height=550)
Concluding thoughts
Next steps